Information capacity in the weak-signal approximation 
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We derive an approximate expression for mutual information in a broad class of discrete-time 
stationary channels with continuous input, under the constraint of vanishing input amplitude or 
power. The approximation describes the input by its covariance matrix, while the channel properties 
are described by the Fisher information matrix. This separation of input and channel properties 
allows us to analyze the optimality conditions in a convenient way. We show that input correlations 
in memoryless channels do not affect channel capacity since their effect decreases fast with vanishing 
input amplitude or power. On the other hand, for channels with memory, properly matching the 
input covariances to the dependence structure of the noise may lead to almost noiseless information 
transfer, even for intermediate values of the noise correlations. Since many model systems described 
in mathematical neuroscience and biophysics operate in the high noise regime and weak-signal 
conditions, we believe, that the described results are of potential interest also to researchers in these 
areas. 

PACS numbers: 89.90.+n, 89.70.Kn 



I. INTRODUCTION 

Information theory is a mathematical framework that 
provides tools for quantification of information content 
and information transfer in systems defined by general 
probabilistic rules [1J. The theory has been applied suc- 
cessfully to a wide range of problems [2|, including, e.g., 
classical and quantum computation and communication 
optical communication or quantification of 

different aspects of information processing in real neu- 
rons and neuronal models 

The measure of information transfer in information 
theory is represented by a nonlinear functional of the 
probability measure over the joint input-output space 
[l| . The concavity of this functional in the input proba- 
bility measure has important implications for numerical 
approaches to finding the information optimality condi- 
tions [ll [TB - [T8j . On the other hand, approximations or 
even closed-form solutions are quite rare. The classical 
exact solution for the linear channel with additive (pos- 
sibly non-white) Gaussian noise [l], [l^ | and input power 
constraint has been applied in many different situations. 
However, in many cases of interest the channel is signif- 
icantly nonlinear or non- Gaussian or there are different 
input constraints (20| and one has to rely on numerical 
solutions or approximations. 

The approximations allow us to investigate, although 
locally and under perhaps restrictive scenario, the effect 
of individual components in the system on the optimal- 
ity conditions. In particular, if the noise in information 
transfer is substantially low and regular, there exists a 
tight lower bound on the information optimality condi- 
tions (denoted as low-noise approximation in this paper) 
which has been investigated in [12, [2lT[23| . In this pa- 
per we continue the effort started in [2J| and we describe 



essentially the opposite situation: the high-noise approx- 
imation. Such approximation is of interest when the sig- 
nal is very weak compared to the noise in the information 
transfer, for example, as in the classical stochastic reso- 
nance effect observed in electrosensory neurons [24|, HBJ • 



II. MEASURES OF INFORMATION 

Throughout this paper we assume the discrete-time 
setting |5j, we denote the consequent channel outputs 
(responses) as a vector of random variables (r.v.) R = 
({i?i}f =1 ) T , which may be discrete or continuous, i in- 
dexes the time and (-) T denotes the transposition. The 
response, Ri = r iy results from the corresponding in- 
put 0; = 9i, where the input is also described by a n- 
dimensional r.v. 0. The multidimensional description 
of the process of information transfer between and R 
allows us to include the effect of memory, i.e., the depen- 
dence on current and also on past inputs and responses. 
We also assume that the input alphabet is continuous [|[ • 
In the following we consider stationary channels fully de- 
scribed by the conditional probability density function 
(p.d.f.) f(r\9), which generally factorizes as [26| 



f(r\e) = Hf i (r i \9 i ,e i . 1 ,...,e 1 



(1) 



We do not consider channel feedback, the dependence of 
current input on past responses [l[ . 

Mutual information (MI) is the fundamental quan- 
tity measuring information transfer in channels [l[. MI 
7(0; R) gives the degree of statistical dependence be- 
tween inputs and responses, defined as 



I(&;R) = (D KL [/(r|0) || p(r)]> 



e - 



(2) 



where 
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p(r) = (/(r|0)> £ 



(3) 
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is the marginal joint p.d.f. of responses, and the av- 
eraging is with respect to the input p.d.f., n(8). The 
Kullback-Leibler (KL) divergence is defined as 



Dkl [f(r\0) || p(r)] = ( In 



me) 

p(r) 



(4) 



where the averaging is with respect to f(r\6). From 
Eq. ||2J| follows, that MI is a property of the joint distribu- 
tion of stimuli and responses. Of particular interest are 
the optimality conditions for information transfer, that 
is the maximum value of 7(0; R) and the corresponding 
optimal n(G). In order to have a well-posed problem, one 
is interested in the optimality conditions for satisfy- 
ing certain additional constraints, e.g., average power or 
range of inputs [J, [2(| • The maximum value of MI per 
channel use, taken over all possible stimuli distributions 
satisfying constraints G, is denoted as the information 
capacity, C, defined as [2(| 



C = lim — 

n— foo fi 



sup 7(0; R) 

7r(6»)ee 



(5) 



In this paper we interpret C as the upper bound on the 
rate at which the information can be transmitted reli- 
ably [l| , without considering the complexity of achieving 
such maximum rate in practical terms. Specifically, do 
not discuss the properties of any particular coding and 
decoding schemes [5j. 

Whenever we are interested in reliability of input- 
output transmission, we naturally interfere with the do- 
main of statistical estimation theory (2?| ■ Fisher infor- 
mation (FI) matrix, defined as 

(6) 



J(0|R) = ([Vln/(r|0)][Vln/(r|0)] T ) 



where 



V 



d 



d 



(7) 



imposes limits on the precision of 9 estimation from the 
responses by means of the Cramer- Rao bound, which says 
that for the variance of any unbiased estimator of 9i holds 
Var(^) > [J- 1 (0|R)] i< [27] . Generally, FI requires that 
/(r|6>) is continuously differentiable in 9 [27J. In this pa- 
per, we additionally assume that f(r\0) is twice continu- 
ously differentiable in 9, so that the following conditions 
hold 



f V/(r|0) <fr = 0, f VV T /(r|0)dr = 0. 
Jr. Jr 



(8) 



There is a variety of relationships between FI, MI and 
KL divergence established in the literature [l|, H^, [29| . 
further motivated by the fields of information geometry 
[30l | or stochastic complexity [3l[ . The already mentioned 
low-noise approximation to MI is constructed by employ- 
ing the Cramer- Rao bound [H, [HHUl- Although we 
demonstrate that the high-noise approximation also in- 
volves FI, we never employ the Cramer-Rao bound and 
the appearance of FI is due to certain asymptotic prop- 
erties of the KL distance 1281] - 



III. INFORMATION TRANSFER BY WEAK 
SIGNALS 

A. Small input amplitude limit 

The channel properties are described by the condi- 
tional probability density f(r\9), which satisfies the reg- 
ularity conditions (JSJ> . The input, described by r.v. 0, 
is restricted in amplitude, 



e [0 O - A9, 9 + AO] 



(9) 



for chosen 9q and A0, or more precisely in components: 
for all i holds 9, E [6 - AO, 9 + A0] and AO > 0. 
The situation for a memoryless channel is illustrated in 
Fig. [TJ The goal is to derive an approximation to mutual 
information in the limit ||A0|| — > 0. We demonstrate in 
detail in Appendix A, that the approximation (to second 
order in the input amplitude) can be written as 



1 



7(0; R)« -tr [J(0 o |R)C©] , 



(10) 



where J(0q|R) is the FI matrix from Eq. ^ evaluated 
at 9 = 9q, C© is the covariance matrix of and tr (•) is 
the matrix trace. Eq. (fTT))) . derived also in J24|, holds for 
a broad class of channels with memory, both biologically- 
inspired and artificial and represents the main result. 
An important feature of Eq. (fTU|) is, that the channel 
properties (described by the FI matrix) and the input 
properties (described by its covariance matrix) are sepa- 
rated. Therefore, the maximum value of MI can be found 
by matching the corresponding elements of J(0g|R) and 
C©. The elements of the covariance matrix of can be 
written as 1321 



life 



0~ 2 Qik, 



(11) 



where a 2 = ^/Var(0i)Var(0fc) is constant for alii, k due 
to stationarity, and = corr(0i,0/ c ) is the correla- 
tion coefficient. The maximum variance of the amplitude 
constrained input from Eq. © is maxcr 2 = (AO) 2 and 
— 1 < Qik < 1, thus 7(0; R) in Eq. (|10p is maximized if 



sgn[J(0 o |R)]ife, 



(12) 



where sgn(-) is the sign function. Note, that the diag- 
onal elements of the FI matrix are positive while the 
off-diagonal elements can be negative. It may happen, 
that the matrix C© formed by Eqns. (fT2"|) and (jlip is 
not positive-semidefinitc[33], i.e., it cannot be a proper 



covariance matrix 34 
positivc-scmidcfinitc 



even though J(0q|R) generally is 
271 ] . However, in all problems we 
have calculated so far, proper input covariance matrix 
could be formed, given J(0o|R), and then it holds from 
Eqns. © and (H 



C fa Chigh = lim 



(Ml 
2n 



£l[ J (0o|R)]ife| 



(13) 
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Input signal, 



/(r|0 o +A0) 



f(r\6 o -A0) 



FIG. 1. Information transmission with amplitude-constrained 
inputs. The input signal, described by r.v. O, is restricted to 
the interval [Oq — A9, do + A6] . Due to presence of noise, the 
responses to each particular 9 vary randomly, described by 
the conditional probability density f(r\6). While the mem- 
oryless information channel is fully described by f(r\6), the 
amount of information transferred depends on both f{r\0) 
and the distribution of O. We examine the maximum infor- 
mation transfer by inputs restricted to small amplitudes when 
there is a significant overlap of f(r\9o — A8) and f(r\6o+A9). 
Heuristically, the problem can be also described as the infor- 
mation transmission in a very noisy environment, or under 
very low signal-to-noise ratio conditions. 



where Chigh denotes the high noise approximation to the 
true capacity C . 

For stationary memoryless channels /(r|0) factorizes 
due to Eq. © as Q, p. 193] 



me)=f[f(r i \9 i ), 



(14) 



i=l 



thus from Eq. © follows that the FI matrix is diagonal, 
J(9 \R) = [J(0 o |R)]« = ([de\nf(r\e)} 2 ) rlf) , and from 
Eq. (fl"3"l) we have 



(15) 



a result obtained by different means in [35j | . The optimal 
input p.d.f., 7T*(6), is the maximum variance distribution 
over the given input range, 

- \S(6 -9 - AO) + ^5(9 -9 Q + A9), (16) 

where <£(•) is the Dirac's delta function. In other words, 
the capacity is achieved by a binary input, and thus C < 
lbit. 

From Eq. IjlOp follows, that non-diagonal elements of 
C© do not affect the information capacity of memoryless 
channels in the vanishing input amplitude case. This 
result is counterintuitive, because correlations generally 



decrease the input entropy [l[ . Therefore in the following 
we provide a proof which is independent of Eq. (|1L)|) . Let 
us consider two consequent uses of a stationary memory- 
less channel, i.e., = {8i,8 2 } T , R = {R 1 ,R 2 } T . We 
assume, that the inputs 0i and 82 are generally sta- 
tistically dependent, (81,62) ~ n(9i,9 2 ), and the joint 
marginal distribution of responses is denoted as p(r), see 
also Eq. (J5|) . By employing the factorization (fl"4"|) and ba- 
sic relations between entropy, h(R) = 
MI [J p. 21] we have 



(lnp(r)) r , and 



7(0; R) = h(R) - (h(R\0)) g = 

= h(R 1 ) + h(R 2 )-I(Ri;R 2 ) - 

-(h{R 1 \9 1 ) + h{R 2 \9 2 )) 9 = 
= 7(9 i; #1) + J(0 2 ; R 2 ) - I(Ri;R 2 ) 
= 27(e 1 ;7? 1 )-7(7? 1 ;7? 2 ), 



(17) 



since J(0i; Ri) = I(Q 2 ; R 2 ) due to stationarity. In other 
words, the difference in information transfer when using 
two dependent or independent inputs in the memoryless 
channel is equal to 7(7?i; T?^). Obviously, for 81, 02 inde- 
pendent holds 7(7?i; 7? 2 ) = 0. The strength of the depen- 
dence between 7?i and 7?2 for correlated inputs depends 
on the input range and the conditional response distri- 
butions, see Fig. [TJ We expect I(Ri;R 2 ) to be maximal 
for the extreme input dependence, e.g., 82 = 0i, where 
81 is equiprobably equal either to 9o — A9 or 9q + A9. 
It follows, that 7?i,7?2 are conditionally (given 81) iden- 
tically and conditionally independently distributed. If 
f( r \8o — A0) and f(r\&o + A9) are well separated, then 
7(7?i;7? 2 ) > because 7? 2 provides redundant infor- 
mation to 7?i. As A9 -4 0, then f{r\9 - A9) and 
f( r \9o + A#) become (almost) identical due to continuity 
in 9 and thus I(R\; R 2 ) — > 0. To make the argument pre- 
cise, we show that 7(7?i; 7?2) = to the second order in 
the input amplitude, so that the effect of input correla- 
tions in memoryless channels is of higher order than the 
approximate Eq. (110[) . The joint response distribution is 

p(ri,r 2 ) = ^f(ri\9o + A9)f{r 2 \9 a + A9) + 

+ ±f(r 1 \9 Q -A9)f(r 2 \9 Q -A9), (18) 

from which the marginals follow p{r{) = /(ri|#o + 
A0)/2 + f(rx\e - A0)/2, and similarly for p(r 2 ). We 
employ another formula for MI [J p. 251] 



I(Ri;R 2 ) = 7J KL b(ri,r 2 ) || p(n)p(r 2 )] 



(19) 



By substituting from Eq. (|T5|) into Eq. (HHJ), and by em- 
ploying the Taylor expansion in A9 around A9 — 0, we 
have (the terms up to A9 are zero) 



I(Ri;R 2 ) 



Ri xR 2 



df(n\9) df(r 2 \9) 



09 



09 



dr\ dr 2 , 



which is equal to zero, due to Eq. ©. The first 
nonzero term is of 4-th order, and can be written as 



4 



(A60 4 J(6> o |i?i)J(6>o|i? 2 )/2, provided that f(r\0) is three 
times continuously differentiable in 8. 

On the other hand, for channels with memory the input 
correlations do matter, irrespectively of the smallness of 
the amplitude. Consider, for example, two channel uses 
in the additive noise case, Ri = 6i + Zi, (Zi) = 0, where 
i — 1,2. It is possible to approach the noiseless channel in 
the extreme case of matching input and noise correlations 
in accord with Eq. ([TO]), e.g., if corr(Zi,Z 2 ) — !> —1 and 
corr(9i, 9 2 ) -> 1, then R 1 = G 1 + Z x and R 2 = Q 1 - Z x 
and so by adding R 1 + R 2 we can recover the value of 6i 
perfectly. 



same in both cases and reads 

7(0; R) « i-tr [J(e |R)C 6 ] = ^tr [J(0 O |R)C©], (24) 

where (0) = 6q. 

Eqns. p(J|) and (|24|) are identical, although the assump- 
tions on are different. Consider for example the memo- 
ryless channel with power constraint P > e 2 on the input 
and (9) = 0, so that Eq. ([24") can be written as 



7(6; R) 



■J(0\R). 



(25) 



B. Small input power limit 

The signal power [36| , P© , of an input signal described 
by r.v. is defined as 



P & = - <0 T 0) 



(20) 



For the covariance matrix C© of r.v. holds C© 
<(0 - (0))(© - <©)) T ), and therefore 



P® = - trC© 

n 



<e> II 



(21) 



The information channel is constrained in the input 
power P if only inputs that satisfy P > P© are con- 
sidered. It is common in information theory of power- 
constrained channels, to assume (0) = 0, then P© = 
trCe/n [j], p. 277], which we assume here also. The as- 
sumption (0) = results in simpler notation, although 
it does not affect the generality of results. Due to sta- 
tionarity, the marginal variances of r.v. are constant, 
Var(8i) = const, for all i, thus we can write 



= £0, 



(22) 



where Var(0i) = 1 and e > is the scaling factor. The 
power of the input is then P© = e 2 , and the vanishing 
input power is achieved by e —> 0. 

The approximate expression for MI in the vanishing 
input power limit is obtained analogously to the proof 
presented in Appendix A, by expressing 7(0; R) in terms 
of the auxiliary r.v. 0, and then expanding for e — > 
around e — 0. Let ~ 7r(0) and ~ g(8), then from 
Eq. (22) follows tt(0) = g(9/e)/e = g(8)/e, and also 
d6 = edO. The MI can be written by (analogously to 
Eq. (52)) 



7(0; R) = (P KL [f(r\e6) || (f(r\e8)) § ] ) 8 ■ 



(23) 



The rest follows the argument of Appendix A, although 
simplified due to (0) = 0. It is obvious from the general 
proof, that the assumption on zero (0) is not essential, 
only that the vanishing input power is then with respect 
to (0), so that trC©/n is the vanishing power of in- 
put fluctuations. Nevertheless, the approximation is the 



The capacity is achieved by any distribution of inputs 
with power Pq = e 1 = P, for example by the discrete 
distribution from Eq. (T5) with A6 = VP, or by the 
Gaussian distribution Af(0,P). Specifically, it is well 
known that the capacity of a power-constrained linear 
additive white Gaussian noise (AWGN) channel is Q 



(26) 



where P is the power constraint on the input and N is 
the noise power, and that the capacity is achieved by 
a normal distribution A^(0,P). The signal-to- noise ra- 
tio (SNR) is then defined as SNR = P/N. By expand- 
ing Eq. (|26[) to first order in P for P <C N we have 
C « P/N/2, which corresponds exactly to Eq. (25), since 
for the Gaussian additive noise holds J(0|P) = X/N. A 
detailed review of AWGN channel capacity and its differ- 
ent approximations for different SNR regimes (including 
the high-noise approximation above) can be found in [3^ | . 
The conclusion that in the vanishing input-power limit 
the capacity of AWGN channel can be achieved by both 
discrete and M (0, P) distributions is not so surprising 
in the light of some recent research on the AWGN chan- 
nels [38|. It has been shown, that although the optimal 
input distribution is generally J\f (0, P), the capacity can 
be near-achieved by a discrete distribution, and specially, 
if P N the other possible capacity-bearing distribu- 
tion is indeed binary discrete. The methods employed in 
[38| are, however, different from our approach. We fur- 
ther discuss the compatibility of Eq. (|24p with the exact 
results obtained for non-white AGN channels in the low- 
input power regime in the Results section of this paper. 



C. Simple lower bound on memoryless channel 
capacity 

We have demonstrated in the previous sections, that if 
the input to the memoryless channel is weak (in ampli- 
tude or power), the optimal distribution is discrete and 
binary. Therefore the channel capacity cannot be more 
than 1 bit. Note, however, that the capacity can be larger 
than 1 bit for channels with memory under certain cir- 
cumstances, as we demonstrate in the Results section. 
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It follows from the proof in Appendix A, that the 
Fisher information arises in Eq. (|1UJ) from Taylor- 
expanding the involved KL distances in the expression 
for MI. More precise approximation to channel capacity, 
Cbin, can be thus obtained without Taylor expansions, 
just by substituting the discrete input distribution from 
Eq. (Tig) into Eq. @, 

Cwn = \d kl [f(r\e - A9) || p(r)] + 

+^D Kh [f(r\0 + AO)\\p(r)], (27) 

where p{r) = f(r\0 - AO)/ 2 + f(r\9 + A0)/2. The 
parameter A9 is half of the maximum input amplitude 
for amplitude-constrained channels, and A9 = \/~P for 
power-constrained channels. 

Eq. (f2"T|) is the lower bound on the true capacity, 
C > Cbin, which holds whether the amplitude (or power) 
is small or not. The extension of Eq. (|27p to chan- 
nels with memory is not straightforward, for example 
the calculation of Cbin would require numerical evalua- 
tion of possibly high-dimensional integrals which may not 
be numerically stable [3{|. Therefore for channels with 
memory we propose to employ Eq. (fit)]) as the simplest 
method. 



Eq. (|3"0|) is a lower bound on the true channel capacity, 
C > Ciow, tight with the vanishing noise in the informa- 
tion transmission. In the case of amplitude-constrained 
AWGN channel we have 

Ciow = In ;= (31) 

V 2ne 

Fig. [5^. shows the comparison of the exact channel 
capacity (data taken from [16||) with C n igh,Cbi n and Ci ow , 
expressed as functions of the signal-to- noise ratio (in dB), 
which is defined as (l6j 

SNR = 101og 10 [(A0) 2 ] . (32) 

The capacities are evaluated in bits which means convert- 
ing the natural logarithms in Eqns. (fTS"]) . (|2"T|) and ([^0]) 
to base 2, i.e., to divide the values by In 2. While Ci ow 
and C n igh provide good approximations only for rather 
high and small SNR values, the Cbin approximation gives 
good results even for intermediate SNR values. A simi- 
lar figure with additional approximations for the classical 
AWGN channel capacity can be found in (37[. 



2. Temporal neuronal coding 



IV. RESULTS FOR SELECTED SYSTEMS 
A. Memoryless channels 

1. Amplitude constrained linear AWGN channel 

The capacity and capacity-bearing input distributions 
of the linear AWGN channel, 



r = e + z, 



(28) 



where r.v. Z is zero-mean Gaussian and the input is 
constrained in amplitude, were studied in detail in [l8[. 
Contrary to the well known Eq. (|26p for the input power 
constrained channel, no closed-form expression for capac- 
ity exists in the amplitude constrained version, moreover 
the optimal input distribution is known to be discrete 
with finite set of mass points. 

We assume 9q = 0, the maximal input amplitude 
is 2A0, thus the input is bound to lie in the interval 
[—AO, AO]. Furthermore we assume that the power of 
the noise is N = 1, so the noise is described by the stan- 
dard normal r.v., Z ~ Af (0, 1). Eq. (fTS"]) then becomes 



1 



Chigh = tt(A0) 



(29) 



The binary approximation, Cbin given by Eq. (]27|) , has to 
be evaluated numerically. Additionally, we also investi- 
gate the low noise a ppr oximation to MI, Ci ow , which is 
also based on FI [TIM HI, 



C 



low 



In 



(30) 



Recently, the information capacity of a memoryless 
neuronal model has been analyzed in detail [l7|. It 
is assumed, that the neuronal response R is the inter- 
val between two consequent action potentials. In agree- 
ment with some experimental observations [40T443T ] . the 
response for each input follows the gamma distribution, 



f(r\0) = 



r K 1 exp(— r/9) 
~T K T{k) : 



(33) 



where the parameter 9 is assumed to be the input (stim- 
ulus intensity). Based on further experimental observa- 
tions [44], the input is constrained in amplitude, 5/k < 
9 < 50/ k. The exact capacity was calculated numerically 
by Ikeda and Manton fl7| for 0.75 < k < 4.5. 

While Cbin has to be evaluated numerically, for the high 
and low noise approximations we have 



r - 81 
i-high - 242 «, 



Ciow — U1 ' 



•In 10 



(34) 



The results are shown in Fig. For the investigated val- 
ues of k, both Chigh and Cbin approximations give better 
results than Ci ow , which suggests that this particular case 
of temporal coding falls within the "high noise" category. 
Neuronal responses often vary substantially across iden- 
tical stimulus trials, thus the highly noisy information 
transmission is not unusual as reported from experimen- 
tal measurements [45l |, A simple model of a stochastic 
resonance in an electrosensory neuron, subject to sub- 
threshold (i.e., very weak) stimulatio n I25l . |46| ] has been 
analyzed by employing Chigh recently [24| . 
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(a) Amplitude constrained linear AWGN channel 
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(b) Temporal coding: "gamma" neuron 
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FIG. 2. Capacities and their approximations in memory- 
less channels. The high-noise capacity approximation (Chigh, 
Eq. (JT3J)) approximates the true capacity of the amplitude- 
constrained AWGN channel (a) well only for very low signal- 
to- noise ratios (SNR), just like the low- noise approximation 
(Ciow, Eq. (J30j) ) does for high SNRs. The binary-channel ap- 
proximation (Cbin, Eq. I|27pl holds well even for intermediate- 
low SNRs. The exact solution is taken from [TJ|. The in- 
formation capacity of a simple model of neuronal coding (b) 
apparently falls into the high-noise category, since both Chigh 
and Cbin approximate the true capacity (taken from [l7|]) bet- 
ter than Ci ow . 



B. Linear Gaussian channel with memory and 
input power constraint 

First, we demonstrate that Eq. (|24|) is compatible with 
exact results available on input pow er constrained linear 
AGN channels with memory pi Il9l| in the limit of weak 
input power. The channel is defined as 

R = + Z, (35) 

where the zero-mean input is constrained in power P [l|, 
p.277], 



P > -trC e , 

n 



(36) 



and the noise is given by the multivariate normal distri- 
bution with covariance matrix Cz, Z ~ Af (0, Cz)- The 
channel conditional p.d.f. is therefore 



1 



/W»)- 7g p3 SC «p[('-«) T °i 1 ('-«)l 



and substituting Eq. fl37|) into Eq. © gives [I] 
J(0|R) = C z \ 



(37) 



(38) 



which is independent of 9. 

From the spectral decomposition theorem [34| follows 
that 



C z = QAQ T , 



(39) 



where the matrix A is diagonal with positive elements 
and Q is orthonormal. The capacity per channel use is 
then given by [l9[ 



(40) 



where the constants > are determined by the water- 
filling procedure (2, p. 274], so that the power constraint 
given by Eq. (I36|) holds as yV., m, = nP. Further- 
more, the optimal input distribution is also multivari- 
ate normal, ~ A/"(0, C©), with covariance matrix 
C@ = QMQ T [13, P-279], where the diagonal matrix 
M is defined as [M.]u — rrn. 

In order to obtain the vanishing input power limit of 
Eq. (|40[) . we observe that as P — > also rrij — > 0, so we 
can expand Eq. (|40[) as 



C 



1 ™ 

-V 

2n f-* 



rrii 



2n 



-tr (A^M) 



(41) 



By combining Eqns. (f3"5)) , (131)1) , (|41|) and basic properties 
of matrix inverse and trace 1341 we have 



1 



1 



C « — tr [(Q T C Z Q) _1 M] = — tr [C^C^CiM] 



2n 
1 



2n 



tr[C z 1 QMQ T ] = — tr[J(0|R)C e ], (42) 
2n 2n 

which corresponds to the capacity per channel use as n — > 
oo, due to Eq. (]24[) . for power achieving input, tr C©/n = 
P. 

Next, we illustrate Eq. (I42|) on two simple models of 
Gaussian noise with memory 



1. AR(1 ) noise 

The channel is given by Eqns. (j3"5|) and (j3"6"|) , with 
Zi's following the AR(1) process: Zi = gZi-\ + X», 
where —1 < g < 1 is the correlation coefficient, g — 
corr(Zi, Zi—i), and Xi are independently distributed 
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standard normal r.v.'s, X t i ~ d ' TV (0,1) [32]. The noise 
covariance matrix has elements 



Capacity per vanishing input power 



[C z ]ik = Q li - kl , 



(43) 



and its inverse, equal to the FI matrix by Eq. (|38p. is 
tridiagonal, 



/ 1 



J(0|R) 



1 - g 2 



-Q 







-q 1 + g" -g 

-Q 







: : '•• 1 + g- -g 
\ Q —g 1 J 



(44) 

We denote the correlation coefficient between consequent 
inputs a,s c — corr(0;, 0,+i). The MI per channel use for 
maximum power achieving input, P = trC©/n, can be 
found exactly by employing Eq. 



(45) 



hm -J(0;R) = - . 

n->oo n I 1 — Q~ 



For g = (memoryless channel) the value of c does not 
matter as discussed earlier. The capacity per channel use 
is 



^high — 



Pg 2 + l + 2|g| 
2 1-g 2 



(46) 



since sup_ 1<c<1 (— eg) = \g\. The capacity in bits per 
vanishing input power, Chi g h/-P, is shown in Fig. [3] in 
dependence on the noise correlation g. Note that from 
Eq. (|4"6"1) follows Chigh/P — > oo as \ g\ 1, i.e., as the noise 
correlation increases, its corrupting power decreases and 
in the limit we can approach the noiseless channel. 



2. MA(1) noise 



S3 10 1 
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FIG. 3. The capacities per vanishing input powers for the 
AR(1) and MA(1) Gaussian additive noise models in depen- 
dence on the the noise correlation coefficient q (the graphs 
are symmetric in g). Note that the capacity tends to infinity 
as \g\ -> 0.5 (the MA(1) model) and as \g\ -> 1 (the AR(1) 
model). In these limits, the corrupting power of the noise in 
the information transfer is decreased to the point, that the 
channel approaches the noiseless channel and the input value 
can be recovered perfectly. 



Fig. [3J Note, that for n < 2000 we were unable to ob- 
tain stable values of Chi g h for \g\ > 4.2. This is caused 
by the fact, that the dominant terms of the FI matrix, 
and consequently C^i^/P, diverge to +oo as \g\ — > 0.5 
(in a similar way as Eq. (|4"6"]) does for \g\ —> 1). In other 
words, the dependence structure of the MA(1) process is 
sufficiently "rigid" even for intermediate correlation val- 
ues, that by properly matching the input correlations we 
can approach the noiseless information transfer. The ex- 
amples of optimally matched input signals are shown in 
Fig. Hp, d, e. 



The channel is given by Eqns. ([33)1 and ([351) . r - v -'s 
Zi follow the MA(1) process, Zi = Xi — 7X^1, where 

— 1 < 7 < 1 is the parameter of the process and Xi ''^ d ' 
J\f (0, 1). The parameter of the MA(1) process and the 
correlation coefficient g = corr(Zi, Zi-±) are related as 
g = -7/(1 + 7 2 ), and therefore -0.5 < g < 0.5 [H. 
The covariance matrix of the MA(1) process is tridiago- 
nal, and its inverse has all elements non-zero, although 
decreasing in absolute value with the distance from the 
main diagonal, see Fig. [4^, b. 

Recently, a closed form expression for C^ 1 of the 
MA(1) process has been published [47[. The expression 
is rather complicated and we cannot evaluate the anal- 
ogous limit to Eq. ([43)1 in a closed form. Nevertheless, 
we approximate the capacity per channel use by consid- 
ering n high enough, and the closed form expression for 
the elements of the FI matrix allows us to avoid numer- 
ical issues when inverting the covariance matrix. The 
capacity per vanishing input power, Chigh/-P, is shown in 



V. CONCLUSIONS 

We derive approximate expression for mutual informa- 
tion in a broad class of discrete-time stationary chan- 
nels (including those with memory) with continuous, but 
small, input. The input is restricted either in amplitude 
or in power and we study the optimality conditions on 
information transfer as the power or amplitude approach 
zero. We find that the input and channel properties are 
separated in the approximate formula, which allows us 
to study the optimality conditions in a convenient way. 
Specifically, we find that the increase of mutual informa- 
tion from zero power (or amplitude) for a given channel 
depends only on the input covariances. 

For memoryless channels, the capacity cannot be more 
than 1 bit per channel use and the optimal input is unique 
discrete binary distribution in the small input amplitude 
case, but generally non-unique in the small input power 
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(a) FI matrix, p = +0.45 (b) FI matrix, p = -0.45 

I 2 




20 40 
Column 



(c) Optimal signalling, noise p — 



(d) Optimal signalling, noise p < 



(e) Optimal signalling, noise p > 



discrete time steps 



FIG. 4. Small input amplitude optimality conditions for lin- 
ear channels with AR(1) or MA(1) additive Gaussian noise. 
The structure of the Fisher information matrix of the MA(1) 
model (panels (a) and (b) for n = 50) shows elements decay- 
ing in absolute value with distance from the main diagonal, 
sign changes occur for positively correlated MA(1) process 
and all elements are positive for g < 0. The structure of 
the FI matrix determines the covariance matrix of the opti- 
mal signal. Panel (c) shows the example optimal input to 
the memoryless channel (noise correlation g = 0): random 
switching between input values +VP and —yP (discrete bi- 
nary input), where P is the input power constraint. The 
same capacity would be achieved by input values described 
by the normal distribution M (0, P), as discussed in the text. 
Depending on the sign of the noise correlation g, the opti- 
mal input is characterized by extremal value of correlation 
between consequent inputs (panels (d) and (e)). Note, that 
the capacity of the memoryless channel is achieved by (d) 
and (e) also, independently on the input correlations. 



case. We demonstrate, that the effect of input correla- 
tions in memoryless channels is of higher order than the 
order of the capacity approximation, and thus the addi- 
tional correlations do not decrease the capacity although 
they decrease the input entropy. We also provide a simple 
lower bound on capacity of memoryless channels subject 
to weak-stimulus constraints that gives better results in 
practical situations. 

In channels with memory, the capacity can be greater 
than 1 bit and the input correlations play the most im- 
portant role. We show, that the approximate formula in- 
cludes the small input power limit of the exact solution 
for linear additive Gaussian noise channels with memory. 
We show, that by properly matching the input covari- 
ances to the dependence structure of the noise, we can 
approach in certain cases the noiseless channel even for 
intermediate values of the noise correlations. 
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Appendix A: Capacity in the vanishing input 
amplitude 

We introduce an auxiliary r.v. 5® by employing 
Eq. M as 



(50 = - O , 



(Al) 



so that for all i holds 50 t e [-A9, AO}. The p.d.f. of r.v. 
<50 is denoted as ir(59). Mutual information 7(0; R) 
from Eq. (J2J) can be written in terms of r.v. (50, whether 
||A0| is small or not as 

1(0; R) = (D KL [/(r|0 o + 56) || (/(r|0 o + S9)) sg ]) sg . 

(A2) 

In order to approximate 7(0; R) around 0o in terms of 
<50 for small ||A0||, we need to expand the KL distance 
in Eq. (|A2|) . We introduce 



y>(r, O + 66) = /(r|0 o + 50) In /(r|0 o + 66), (A3) 
^(r, O + 50) = f(r\6 + 66) In (/(r|0 o + 66)) se (A4) 

and rewrite the KL distance as 

Dkl [/(r|0 o + 56) || (f(r\6 a + 66)) se } = 

= [ [ip(r, O + 66) - 4,(r, O + 56)} dr, (A5) 

thus reducing the problem to expanding ip(r, 0) and 
ijj(r, 6). While the Taylor expansion of ip(r, 0) is straight- 
forward, the expansion of the logarithm of the expected 
value of /(r|0) in ^(r, 0) is examined in the following 
Lemma. 

Lemma 1. Let /(r|0) be twice continuously differen- 
tiable with respect to 6. Then for a chosen 6q, r.v. 
<50 ~ 7r((50) and A0 such, that for all i holds AO > 
and —AO < 50i < AO, there exists P > such, that the 
following approximation for small enough \\ A0| holds 



ln(/(r|0 o + 50)) ae «ln/(r|0 o ) + (5©; 



t V/(r|0o) 
/(r|0o) 



(A6) 

where V/(r|0o) = V/(r|0)| e=eo , the gradient is taken 
with respect to 6 and (<50) = (5&) sg is the expectation 
of r.v. <50. The maximum error of expansion HA6\) is 
bounded by P||A0|| 2 . 

Proof. From the continuity of second derivatives of /(r|0) 
around 0q follows 



9 2 /(r|0) 



dOf 80, 



< M, 



(A7) 
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for all The Taylor expansion of f(r\9) around 0q in 
terms of 50 reads 

/(r|0 Q + 66) « f(r\e ) + <50 T V/(r|0 o ), (A8) 

and furthermore 

|/(r|0 o + 50) - /(r|0 o ) - <50 T V/(r|0 o )| < 

< nM\\S0\\ 2 < C||A0|| 2 . (A9) 



By integrating the expansion ([A8[> . i.e., by taking the 
expectation with respect to r.v. 5®, and by employing 
inequality (|A9|) it can be established that 



7r(50)/(r|0 o + 80) d(S0) - /(r|0 o ) - <£0) T V/(r|0 o ) 



' R 



= / ir(50)[f(r\0 a + 50)- f(r\0 o )-50 T Vf(r\0 o )] d(50) < f ir(50)C\\ A0\\ 2 d(50) = C\\ A0\\ 2 , (A10) 

JR JR 

and therefore the following expansion holds 

7r(50)/(r|0 o + 50) d{50) « /(r|0 o ) + (<50) T V/(r|0 o ), (All) 



with the maximum error of order || A0|| 2 . From the Lagrange mean value theorem follows, that for A, B > holds 



lnA-lnBl < 



min(A, B) 



\A-B\. 



(A12) 



We set A = f R Tr(50)f(r\0 o + 50)d{50),B = /(r|0 o ) + (<5©) T V/(r|0 o ), and combine the inequalities ([ArO]) and (|Al2|) 
to obtain 



lnA-lnBl = 



< 



In / 7r(50)/(r|0 o + 50) d(50) - In [/ (r|0 o ) + <<50) T V/(r|0 o ; 

JR L 

■q|A0|| 2 , 



< 



mm(A, B) 



\A-B\< 



1 



min(A, B) 



(A13) 



where min(j4, B) is finite due to regularity of /(r|0). From the Taylor expansion of ln(a + x) around a in terms of x 
and the expression for the Lagrange remainder [48| we have 



ln(a + x) — ln(a) 



< 



(A14) 



Setting a = /(r|0 o ) and x = (5&) T V/(r|0 o ) thus gives 



In 



/(r|0 o ) + (6@) T V/(r|0 o )l - In /(r|0 o ) 



(,50) T V/(r|0o 



" / 2 (r|0o) " " 



/(r|0o) 

Finally, we apply the triangle inequality for absolute value, \a — /3\ < \a — j\ + |7 — j3\, setting 

(<50) T V/(r|0 o ) 



(A15) 



a = ln.A = ]xi / 7T(<50)/(r|0 o + 50)d{50), 



p = hxf(r\0 o ) 



R 



/(r|0o) 



7 = InB = In [/(r|0 o ) + <<50) T V/(r|0 o ) 



(A16) 
(A17) 
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and by combining inequalities (|A13|) and (|A15|1 we obtain 

(<50) T V/(r|0 o ) 



In / n{6O)f(r\0 o + 60) d(60)- hi f{r\6 ) 

JR 



f(r\0o) 



< 



< 



In / n(5e)f(r\e + 50) d(S0) - In [/(r|6> ) + (<5©) T V/(r|0„) 



R 



hi 



f(v\0 o ) + {6&) T Vf(r\0 o )} - ln/(r|0 o ) - 



((50) T V/(r|0 o ) 



f(r\e ) 



< 



< 



■ * ^ C\\A0f + l|V { (r| y i|2 ||A^H 2 = P\\M\\ 2 , (A18) 
nun (A, B) / (r|»o) 



and therefore 



applying the regularity conditions ((HJl we have 



In </(r|0 o + 86)) se « ln /( r l o) + (SS) 



with error of order ||A£J|| 2 . 



t v/(r|e ) 

f(r\0o) ' 
(A19) 
□ 



In the following we set ip = ip(r, 0o + 50), ip = ip(r, 0q + 
60), J = f(r\0 o ) and V/ = Vf{r\0)\ 0=0o for shorthand, 
and by repeatedly applying Lemma [T] and keeping in 
mind the rules for derivatives (/<?)" = f"g + 2f'g' + fg", 
and (In/)" = /"// — {f'/f) 2 , we obtain the expansions 

tp « / ln / + S0 T ln /V/ + 50 T Wf + 

+ -50 T ln /VV T / (50 + 80 j V ^ i- 88 + 

"vv T / v/v T / n 



f 



r- 



60, 



(A20) 



if) « / ln / + <50 T ln /V/ + (<50) T V/ + 

It t -rVfV T f 

+ -<50 T ln /VV T / <50 + <50 T J — ^ (<50 
2 / 

"VV T / V/V T / 



■7^©} T / 



r- 



(<50) . (A21) 



We substitute these expansions into Eq. (|A5|) . and by 



/ [cp-^drtt 1<50 T J(0 O |R)<50- 

- <50 T J(0 O |R) ((50) + - (<50) T J(0 o |R) (<50) , (A22) 

where we employed the definition (0) of Fisher informa- 
tion matrix for J(0o|R) = J(0|R)|e=e o - Due to symme- 
try J(0 o |R) = [J(0 o |R)] T holds 



<50 T J(0 O |R) (<50) 



1 r 



S0 T 3{0 O \R) (<50) + ((5©) T J(0 O |R)<50 

(A23) 

and so from Eq. (|A2|) we have 



7(0; R) « 1 ([56- («50)] T J(0 o |R) [<50 - («50)]^ & 

(A24) 

The covariance matrix Cs& of r.v. (50 is defined as 



C 5e = ([<50-(<50)][<50-(<50)] T 



56 



(A25) 



and obviously C^-® = C T . Since &o is fixed, and 
= <50 + 0q, the covariance matrices of r.v. and 
r.v. <50 are equal, C@ = C^©. Furthermore, the law 
of matrix multiplication gives [AB]^. = . [A]jj [B]j£, 
thus summing along i = k gives the trace, i.e., tr (AB) = 
Ei[AB]« = J2i,ji A }ij [B] 3 -i. Therefore, Eq. (|X24]) can be 
written in a compact form as Eq. (fTTJ)) . 
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